Zika Virus in Rio de Janeiro Analysis


In [22]:
# libraries statments
import pandas as pd
import folium  # https://github.com/python-visualization/folium/tree/master/examples
from folium.plugins import MarkerCluster  #https://github.com/python-visualization/folium/tree/master/examples
import matplotlib.pyplot as plt

%matplotlib inline

1-Data Acquire


In [4]:
path = '/home/lserra/Temp/'
filename = 'Mosquito.Borne.Disease.tar.gz'
In [5]:
disease = pd.read_csv(path + filename,compression='gzip',header=0)
/home/lserra/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (42,43,44,45) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [6]:
disease.head()
Out[6]:
Mosquito.Borne.Disease.csv ID_AGRAVO DT_NOTIFIC SEM_NOT NU_ANO SG_UF_NOT ID_MUNICIP ID_REGIONA ID_UNIDADE DT_SIN_PRI ... NU_LOTE_V NU_LOTE_H CS_FLXRET FLXRECEBI IDENT_MICR MIGRADO_W TP_SISTEMA latitude longitude NM_DISEASE
0 1.0 A90 2015-09-17 37.0 2015.0 33.0 330455.0 NaN 2296608.0 2015-08-26 ... NaN NaN 1.0 NaN 4.0 NaN 2.0 -22.925318 -43.388889 Dengue
1 2.0 A90 2015-05-15 19.0 2015.0 33.0 330455.0 NaN 7722494.0 2015-05-10 ... NaN NaN 0.0 NaN 4.0 NaN 2.0 -22.872878 -43.425393 Dengue
2 3.0 A90 2015-08-14 32.0 2015.0 33.0 330455.0 NaN 2291266.0 2015-08-08 ... NaN NaN 1.0 NaN 4.0 NaN 2.0 -22.826738 -43.329702 Dengue
3 4.0 A90 2015-08-18 33.0 2015.0 33.0 330455.0 NaN 3134822.0 2015-06-19 ... NaN NaN 0.0 NaN 4.0 NaN 2.0 -22.879648 -43.339151 Dengue
4 5.0 A90 2015-08-18 33.0 2015.0 33.0 330455.0 NaN 3134822.0 2015-06-18 ... NaN NaN 1.0 NaN 4.0 NaN 2.0 -22.814463 -43.380845 Dengue

5 rows × 56 columns

2-Data Transformation


In [7]:
disease.columns
Out[7]:
Index(['Mosquito.Borne.Disease.csv', 'ID_AGRAVO', 'DT_NOTIFIC', 'SEM_NOT',
       'NU_ANO', 'SG_UF_NOT', 'ID_MUNICIP', 'ID_REGIONA', 'ID_UNIDADE',
       'DT_SIN_PRI', 'SEM_PRI', 'DT_NASC', 'NU_IDADE_N', 'CS_SEXO',
       'CS_GESTANT', 'CS_RACA', 'CS_ESCOL_N', 'SG_UF', 'ID_MN_RESI',
       'ID_RG_RESI', 'ID_DISTRIT', 'ID_BAIRRO', 'NM_BAIRRO', 'NU_CEP',
       'NU_DDD_TEL', 'CS_ZONA', 'ID_PAIS', 'NDUPLIC_N', 'DT_INVEST',
       'ID_OCUPA_N', 'CLASSI_FIN', 'CRITERIO', 'TPAUTOCTO', 'COUFINF',
       'COPAISINF', 'COMUNINF', 'CODISINF', 'DOENCA_TRA', 'EVOLUCAO',
       'DT_OBITO', 'DT_ENCERRA', 'DT_DIGITA', 'DT_TRANSSM', 'DT_TRANSRM',
       'DT_TRANSRS', 'DT_TRANSSE', 'NU_LOTE_V', 'NU_LOTE_H', 'CS_FLXRET',
       'FLXRECEBI', 'IDENT_MICR', 'MIGRADO_W', 'TP_SISTEMA', 'latitude',
       'longitude', 'NM_DISEASE'],
      dtype='object')
In [8]:
disease.dtypes
Out[8]:
Mosquito.Borne.Disease.csv    float64
ID_AGRAVO                      object
DT_NOTIFIC                     object
SEM_NOT                       float64
NU_ANO                        float64
SG_UF_NOT                     float64
ID_MUNICIP                    float64
ID_REGIONA                    float64
ID_UNIDADE                    float64
DT_SIN_PRI                     object
SEM_PRI                       float64
DT_NASC                        object
NU_IDADE_N                    float64
CS_SEXO                        object
CS_GESTANT                    float64
CS_RACA                       float64
CS_ESCOL_N                    float64
SG_UF                         float64
ID_MN_RESI                    float64
ID_RG_RESI                    float64
ID_DISTRIT                    float64
ID_BAIRRO                     float64
NM_BAIRRO                      object
NU_CEP                        float64
NU_DDD_TEL                    float64
CS_ZONA                       float64
ID_PAIS                       float64
NDUPLIC_N                     float64
DT_INVEST                      object
ID_OCUPA_N                    float64
CLASSI_FIN                    float64
CRITERIO                      float64
TPAUTOCTO                     float64
COUFINF                       float64
COPAISINF                     float64
COMUNINF                      float64
CODISINF                      float64
DOENCA_TRA                    float64
EVOLUCAO                      float64
DT_OBITO                       object
DT_ENCERRA                     object
DT_DIGITA                      object
DT_TRANSSM                     object
DT_TRANSRM                     object
DT_TRANSRS                     object
DT_TRANSSE                     object
NU_LOTE_V                     float64
NU_LOTE_H                     float64
CS_FLXRET                     float64
FLXRECEBI                     float64
IDENT_MICR                    float64
MIGRADO_W                     float64
TP_SISTEMA                    float64
latitude                      float64
longitude                     float64
NM_DISEASE                     object
dtype: object
In [9]:
new_disease = disease[['ID_AGRAVO', 'DT_NOTIFIC']].copy()
In [10]:
new_disease['DT_NOTIFIC'] = pd.to_datetime(new_disease['DT_NOTIFIC'])

3-Slicing Data


In [11]:
# filtering the data by the column ID_AGRAVO
new_disease = new_disease[new_disease['ID_AGRAVO'] == 'A928']

4-Grouping Data


In [12]:
# number of disease found and registered in the State of the Rio de Janeiro between Mar/2015 and Sep/2016
new_disease.set_index('DT_NOTIFIC')
new_disease_counts = new_disease.groupby('DT_NOTIFIC').count()
new_disease_counts.rename(columns={'ID_AGRAVO': 'DISEASES QTY'}, inplace=True)
new_disease_counts.head()
Out[12]:
DISEASES QTY
DT_NOTIFIC
2015-01-05 1
2015-01-08 1
2015-01-10 1
2015-01-14 2
2015-01-20 1

5- Plotting the first numbers


In [13]:
new_disease_counts.plot(figsize=(15,5))
plt.title(u'Zika Virus Diseases Qty by Month')
plt.ylabel(u'Diseases Qty')
plt.xlabel(u'Period')
plt.show()

As we can see the increase of the volume diseases found and resgistered happens during the period from Jan/2016 and May/2016. This period is summer in Brazil. This is the period with the most cases incidents.

6-Drill Down in the Data


In [14]:
disease_rj = disease.copy()
In [15]:
disease_rj = disease_rj[(disease_rj['ID_AGRAVO'] == 'A928')]
In [16]:
disease_rj = disease_rj[(disease_rj['DT_NOTIFIC'] >= '2016-03-01') & (disease_rj['DT_NOTIFIC'] <= '2016-03-31')]
In [17]:
disease_rj['DT_NOTIFIC'] = pd.to_datetime(disease_rj['DT_NOTIFIC'])
In [18]:
disease_rj = disease_rj.drop('Mosquito.Borne.Disease.csv', axis=1)
In [19]:
disease_rj.set_index('DT_NOTIFIC').head()
Out[19]:
ID_AGRAVO SEM_NOT NU_ANO SG_UF_NOT ID_MUNICIP ID_REGIONA ID_UNIDADE DT_SIN_PRI SEM_PRI DT_NASC ... NU_LOTE_V NU_LOTE_H CS_FLXRET FLXRECEBI IDENT_MICR MIGRADO_W TP_SISTEMA latitude longitude NM_DISEASE
DT_NOTIFIC
2016-03-01 A928 9.0 2016.0 33.0 330455.0 NaN 6038913.0 2016-02-26 8.0 1990-01-07 ... 2016013.0 NaN 0.0 2.0 3.304550e+10 NaN NaN -22.84793 -43.28491 Zika
2016-03-04 A928 9.0 2016.0 33.0 330455.0 NaN 2273616.0 2016-03-03 9.0 1981-02-17 ... 2016015.0 NaN 0.0 2.0 3.304550e+10 NaN NaN -22.95024 -43.66628 Zika
2016-03-21 A928 12.0 2016.0 33.0 330455.0 NaN 6995462.0 2016-03-19 11.0 1983-10-20 ... 2016015.0 NaN 0.0 2.0 3.304550e+10 NaN NaN -22.81008 -42.11890 Zika
2016-03-11 A928 10.0 2016.0 33.0 330455.0 NaN 2270072.0 2016-03-11 10.0 1947-07-02 ... 2016013.0 NaN 0.0 2.0 3.304550e+10 NaN NaN -22.98594 -43.24359 Zika
2016-03-16 A928 11.0 2016.0 33.0 330455.0 NaN 2270277.0 2016-03-13 11.0 1954-05-02 ... 2016014.0 NaN 0.0 2.0 3.304550e+10 NaN NaN -23.00192 -43.63829 Zika

5 rows × 54 columns

In [20]:
# total quantity of diseases in the State of Rio de Janeiro during the period from Mar/2016
disease_rj['ID_AGRAVO'].count()
Out[20]:
7901
In [25]:
# as we can see, this is the map distribution of the cases found and registered in the State of Rio de Janeiro,
# only in March/2016
disease_map  = folium.Map(location=[-22.914921, -43.194043])
for d in disease_rj[['latitude','longitude']].values.tolist():
    folium.RegularPolygonMarker(location=d, popup='', fill_color='#769d96', 
                                number_of_sides=4, radius=5).add_to(disease_map)
disease_map
Out[25]:

These were my notice that can be expanded or contributed for who can be interesting. If you want contact me to explain some doubts, make suggestions or criticals...or to know my services - please send me an e-mail: laercio.serra@gmail.com, or visit my website for more details, in: http://lserra.datafresh.com.br

In [ ]: